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TWO SAMPLE TESTS FOR HIGH-DIMENSIONAL 
COVARIANCE MATRICES 

By Jun Li and Song Xi Chen 1 

Iowa State University, and Peking University and Iowa State University 

We propose two tests for the equality of covariance matrices be- 
tween two high-dimensional populations. One test is on the whole 
variance-covariance matrices, and the other is on off-diagonal sub- 
matrices, which define the covariance between two nonoverlapping 
segments of the high-dimensional random vectors. The tests are ap- 
plicable (i) when the data dimension is much larger than the sample 
sizes, namely the "large p, small n" situations and (ii) without as- 
suming parametric distributions for the two populations. These two 
aspects surpass the capability of the conventional likelihood ratio test. 
The proposed tests can be used to test on covariances associated with 
gene ontology terms. 

1. Introduction. Modern statistical data are increasingly high dimen- 
sional, but with relatively small sample sizes. Genetic data typically carry 
thousands of dimensions for measurements on the genome. However, due 
to limited resources available to replicate study objects, the sample sizes 
are usually much smaller than the dimension. This is the so-called "large p, 
small n" paradigm. An enduring interest in Statistics is to know if two popu- 
lations share the same distribution or certain key distributional characteris- 
tics, for instance the mean or covariance. The two populations here can refer 
to two "treatments" in a study. As testing for equality of high-dimensional 
distributions is far more challenging than that for the fixed-dimensional 
data, testing for equality of key characteristics of the distributions is more 
achievable and desirable due to easy interpretation. There has been a set of 
research on inference for means of high-dimensional distributions either in 
the context of multiple testing, as in van der Laan and Bryan (2001), Donoho 
and Jin (2004), Fan, Hall and Yao (2007) and Hall and Jin (2008), or in 



Received July 2011; revised December 2011. 
Supported by a NSFC Key Grant 111 31002. 

A MS 2000 subject classifications. Primary 62H15; secondary 62G10, 62G20. 
Key words and phrases. High-dimensional covariance, large p small n, likelihood ratio 
test, testing for gene-sets. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2012, Vol. 40, No. 2, 908-940. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



J. LI AND S. X. CHEN 



the context of simultaneous multivariate testing as in Bai and Saranadasa 
(1996) and Chen and Qin (2010). See also Huang, Wang and Zhang (2005), 
Fan, Peng and Huang (2005) and Zhang and Huang (2008) for inference on 
high-dimensional conditional means. 

In addition to detecting difference among the population means, there is 
a strong motivation for comparing dependence among components of ran- 
dom vectors under different treatments, as high data dimensions can poten- 
tially increase the complexity of the dependence. In genomic studies, genetic 
measurements, either the micro-array expressions or the single nucleotide 
polymorphism (SNP) counts, may have an internal structure dictated by the 
genetic networks of living cells. And the variations and dependence among 
the measurements of the genes may be different under different biological 
conditions and treatments. For instance, some genes may be tightly corre- 
lated in the normal or less severe conditions, but they can become decoupled 
due to certain disease progression; see Shedden and Taylor (2004) for a dis- 
cussion. 

There have been advances on inference for high-dimensional covariance 
matrices. The probability limits and the limiting distributions of extreme 
eigenvalues of the sample covariance matrix based on the random matrix 
theory are developed in Bai (1993), Bai and Yin (1993), Tracy and Widom 
(1996), Johnstone (2001) and El Karoui (2007), Johnstone and Lu (2009), 
Bai and Silverstein (2010) and others. Wu and Pourahmadi (2003) and Bickel 
and Levina (2008a, 2008b) proposed consistent estimators to the popula- 
tion covariance matrices by either truncation or Cholesky decomposition. 
Fan, Fan and Lv (2008), Lam and Yao (2011) and Lam, Yao and Bathia 
(2011) considered covariance estimation under factor models. There are also 
developments in conducting LASSO-type regularization estimation of high- 
dimensional covariances in Huang et al. (2006) and Rothman, Levina and 
Zhu (2010). Despite these developments, it is still challenging to transform 
these results to test procedures on high-dimensional covariance matrices. 

As part of the effort in discovering significant differences between two 
high-dimensional distributions, we develop in this paper two-sample test 
procedures on high-dimensional covariance matrices. Let Xn, . . . , Xi ni be an 
independent and identically distributed sample drawn from a p-dimensional 
distribution Fj, for i = 1 and 2, respectively. Here the dimensionality p can 
be a lot larger than the two sample sizes n\ and n 2 so that p/rii — > oo. Let /Xj 
and Sj be, respectively, the mean vector and variance-covariance matrix of 
the ith population. The primary interest is to test 

(1.1) H 0a :T ll = T l2 versus H la :T,i^T, 2 . 

Testing for the above high-dimensional hypotheses is a nontrivial statistical 
problem. Designed for fixed-dimensional data, the conventional likelihood 
ratio test [see Anderson (2003) for details] may be used for the above hy- 
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pothesis under p < min{ni, 712}. If we let 



where Q = Q\ + Q2 and n = n\ + ri2. However, when p > min{?ii,ra2}, at 
least one of the sample covariance matrices Qijijii — 1) is singular [Dyk- 
stra (1970)]. This causes the LR statistic — 21og(A n ) to be either infinite 
or undefined, which fundamentally alters the limiting behavior of the LR 
statistic. In an important development, Bai et al. (2009) demonstrated that 
even when p < min{ni,n2} where A n is properly defined, the test encoun- 
ters a power loss if p — > 00 in such a manner that p/rii — > ci G (0, 1) for i = 1 
and 2. By employing the theory of large dimensional random matrices, Bai 
et al. (2009) proposed a correction to the LR statistic and demonstrated 
that the corrected test is valid under p/rii — > Cj G (0, 1). Schott (2007) pro- 
posed a test based on a metric that measures the difference between the 
two sample covariance matrices by assuming p/rii — > Ci G [0, 00) and the nor- 
mal distributions. There are also one sample tests for a high-dimensional 
variance-covariance E. Ledoit and Wolf (2002) and Chen, Zhang and Zhong 
(2010) introduced tests for £ being sphericity and identity for normally 
distributed random vectors. Ledoit and Wolf (2004) considered a class of 
covariance estimators which are convex sums of S n and I p under moder- 
ate dimensionality (p/n — > c). Cai and Jiang (2011) developed tests for £ 
having a banded diagonal structure based on random matrix theory. Lan 
et al. (2010) developed a bias-corrected test to examine the significance of 
the off-diagonal elements of the residual covariance matrix. All these tests 
assume either normality or moderate dimensionality such that p/n— >c for 
a finite constant c, or both. 

We develop in this paper two-sample tests on high-dimensional variance- 
covariances without the normality assumption while allowing the dimen- 
sion to be much larger than the sample sizes. In addition to testing for 
the whole variance-covariance matrices, we propose a test on the equal- 
ity of off-diagonal sub-matrices in Si and S2. The interest on such a test 
arises naturally in applications, when we are interested in knowing if two 
segments of the high-dimensional data share the same covariance between 
the two treatments. We will argue in Section 3 that the two tests on the 
whole covariance and the off-diagonal sub-matrices may be used collectively 
to reduce the dimensionality of the testing problem. 

This paper is organized as follows. We propose the two-sample test for 
the whole covariance matrices in Section 2 which includes the asymptotic 
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normality of the test statistic and a power evaluation. Properties of the test 
for the off-diagonal sub- matrices are reported in Section 3. Results from 
simulation studies are outlined in Section 4. Section 5 demonstrates how to 
apply the proposed tests on a gene ontology data set for acute lymphoblastic 
leukemia. All technical details are relegated to Section 6. 

2. Test for high-dimensional variance covariance. The test statistic for 
the hypothesis (1.1) is formulated by targeting on tr{(Si — X2) 2 }, the squared 
Frobenius norm of Si — £2- Although the Frobenius norm is large in mag- 
nitude compared with other matrix norms, using it for testing brings two 
advantages. One is that test statistics based on the norm are relatively easier 
to be analyzed than those based on the other norm, which is especially the 
case when considering the limiting distribution of the test statistics. The 
latter renders formulations of test procedures and power analysis, as we will 
demonstrate later. The other advantage is that it can be used to directly 
target on certain sections of the covariance matrix as shown in the next 
section. The latter would be hard to accomplish with other norms. 

As tr{(Ei - S 2 ) 2 } = tr(Sf) + tr(S|) - 2tr(£iE 2 ), we will construct es- 
timators for each term. It is noted that tr(S^), where S n h is the sample 
covariance of the hth. sample, is a poor estimator of tr(S 2 t ) under high dimen- 
sionality. The idea is to streamline terms in tr(S^) so as to make it unbiased 
to tr(S 2 t ) and easier to analyze in subsequent asymptotic evaluations. We 
consider U-statistics of form nh ^ h _^ Yli^j( x 'hi X hj) 2 which is unbiased if 
Hh = 0. To account for /i^ / 0, we subtract two other U-statistics of order 
three and four, respectively, using an approach dated back to Glasser (1961, 
1962). Specifically, we propose 

1 2 * 

^ = n h (n h -l) T,^'^ 2 ~ n h (n h -l)(n h -2) T. X 'm X N X ' hj X hk 



(2.1) 



+ 7Tu ruT, ^ E x 'hi x hjX' hk X hl 



to estimate tr(S|). Throughout this paper we use ^* to denote summation 
over mutually distinct indices. For example, k means summation over 
{(i, j, k) :i / j,j / k,k 7^ i}. Similarly, the estimator for tr(SiS2) is 

1 1 * 

Cnm 2 = / . / i X li X 2j) 1 7T X li X 2j X 2j X lk 

t j t,k j 

1 * 

( 2 - 2 ) / ~ / ]/. , X 2i X lj X lj X 2k 

nin 2 (n 2 - 1) . 
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+ mMm - i)("2 - 1) EEW^* 



There are other ways to attain estimators for tr(E?) and tr(EiE2). In 

I) in the form of tr(S^) 



fact, there is a family of estimators for tr(E?) in the form of tr(Sf 



a n h Yl7=i ^ v {(XhiX' hi — Sh) 2 } where a nh = a/n^ for any constant a. A family 
can be similarly formulated for tr(EiE2). It can be shown that this family 
of estimators is asymptotically equivalent to the proposed A nfi in the sense 
that they share the same leading order term. However, this family is more 
complex than the proposed. 
The test statistic is 

(2.3) ^~ni,n2 — ^ni A?i2 2C nin2 

which is unbiased for tr{(Ei — E2) 2 }. Besides the unbiasedness, T ni>ri2 is 
invariant under the location shift and orthogonal rotation. This means that 
we can assume without loss of generality that E(Xy) = in the rest of the 
paper. As noted by a reviewer, the computation of T nitU2 would be extremely 
heavy if the sample sizes rih are very large. Indeed, the computation burden 
comes from the last two sums in A Hh and the last three in C nijn2 , where the 
numbers of terms in the summations are in the order of n\ or nt, respec- 
tively. Although the main motivation was the "large p small n" situations, 
we nevertheless require — > 00 in our asymptotic justifications. A solution 
to alleviate the computation burden can be found by noting that the last 
two terms in A n h and the last three in C ni t n 2 are all of smaller order than the 
first, under the assumption of ph = 0. This means that we can first transform 
each datum Xhi to Xhi — X nh , and then compute only the first term in (2.1) 
and (2.2). These will reduce the computation to 0(n|) without affecting the 
asymptotic normality. The only price paid for such an operation is that the 
modified statistic is no longer unbiased. 

To establish the limiting distribution of T nij „ 2 so as to establish the two 
sample test for the variance-covariance, we assume the following conditions: 

Al. As min{rai,ri2} — > 00, n\/{n\ + 112) — > p for a fixed constant p G (0, 1). 
A2. As min{ni, 712} —> 00, p = p(ni, 712) — > 00, and for any k and I G {1, 2}, 
tr(SfcS^) — > 00 and 

(2.4) tr{(E i E i )(E fc S,)} = o{tr(S i E i )tr(E fc E,)}- 

A3. For each i = 1 or 2, X^ = TiZij + pi where is a p x matrix such 
that r^r^ = Ej, {Zij}™ l =l are independent and identically distributed 
(i.i.d.) mj-dimensional random vectors with rrii>p and satisfy E(Zy) = 
0, Var(Zjj) = I mi , the mi x mi identity matrix. Furthermore, if write 
Zij = (ziji, . . . ,Zij mi )', then each has finite 8th moment, E(zfj k ) = 
3 + Aj for some constant Aj and for any positive integers q and ai such 
that ELi «i < 8E(4 1 /1 • • • *gy = E(^) • • • E(^) for any h + h + 
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While Condition Al is of standard for two-sample asymptotic analysis, 
A2 spells the extent of high dimensionality and the dependence which can 
be accommodated by the proposed tests. A key aspect is that it does not 
impose any explicit relationships between p and the sample sizes, but rather 
requires a quite mild (2.4) regarding the covariances. To appreciate (2.4), 
we note that if i = j = k = I, it has the form of tr(E|) = o{tr 2 (£ 2 )}, which 
is valid if all the eigenvalues of Ej are uniformly bounded. Condition (2.4) 
also makes the asymptotic study of the test statistic manageable under high 
dimensionality. We note here that requiring tr(E&Ej) — > oo is a precursor 
to (2.4). We do not assume specific parametric distributions for the two 
samples. Instead, a general multivariate model is assumed in A3 which was 
advocated in Bai and Saranadasa (1996) for testing high dimensional means. 
The model resembles that of the factor model with Zj representing the 
factors, except that here we allow the number of factor m; at least as large 
as p. This provides flexibility in accommodating a wider range of multivariate 
distributions for the observed data Xy. 

Derivations leading to (6.4) in Section 6 show that, under A2 and A3, the 
leading order variance of T nitTl2 under either Ho a or H\ a is 



a 



ni,n 2 



(2.5) 



tr 2 (Ef) + -tr{(S?-E 1 S 2 ) 2 } 



4A- 

+ — tr{r^(S! - E 2 )r, o r^Ex - s 2 )rj 

m 



■tr 2 (E!S 2 ) 



+ 

nin 2 

where Ao B = (aijbij) for two matrices A = (ay) and B = (bij). Note that 
for any symmetric matrix A, tr(Ao A) < tr(^4 2 ). Hence, 

tr{ri (£i - s 2 )rx o ri(Si - s 2 )rx} < tr{(s 2 - SxS,) 2 } 
tr{r 2 (S! - s 2 )r 2 o r 2 (E! - s 2 )r 2 } < tr{(s 2 - s^) 2 }. 

> 0. We note 



and 



These together with the fact that Aj > — 2 ensure that er^ n2 
that the Tj-Z^ pair in Model A3 is not unique, and there are other pairs, 
say Ti and Zij, such that = TiZij. However, it can be shown that the 
value of ^ L tr{r , -(£i — S 2 )Tj o r^(Si — S 2 )Tj} remains the same. 

The following theorem establishes the asymptotic normality of T nijn2 . 



Theorem 1. Under Conditions A1-A3, as minjni, n 2 } —> oo 

<n 2 [Tn u n 2 ~ tr{(S a - S 2 ) 2 }] A N(0, 1). 
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It is noted that under Ho a : Si = S 2 = S, say, a\ Y n2 becomes 



1 1 ^ 2 



To formulate a test procedure, we need to estimate <Tq n2 . As A ni and 
A n2 are unbiased estimators of tr(S 2 ) and tr(Sr>), respectively, we will use 
o"o,ni,n 2 =: n^Am + ^A n2 as the estimator. The following theorem shows 
that o"omn 2 ^ s ratio-consistent to Oo n „ . 

Theorem 2. Under Conditions A1-A3 and Ho a , as minjni, n 2 } —> oo, 

(2.6) — - — k-r — > 1 /or z = 1 ana 2 ana — : — ; > 1. 

tr(E^) O~0,ni,n 2 

Applying Theorems 1 and 2, under Ho a : Si = £2, 

(2.7) in = ^L 4N(0,1). 

Hence, the proposed test with a nominal a level of significance rejects H^ a 
if T nii „ 2 > <5"o i ni,n2 2; cn where z a is the upper-a quantile of N(0, 1). 

Let ^i ini>n2 (Si,S 2 ;a) = P(T niin2 /<7 , nii „ 2 > z a \H la ) be the power of the 
test under H\ a : Si 7^ S 2 . From Theorems 1 and 2, the leading order power 
is 

(2.8) $(-ir ni , n2 (Si,s 2 ), a + tr{(Sl " S2)2} ' 

where iT ni , n2 (Si, S 2 ) = {a n ^)~ x {-^ tr(Sf) + £ tr(E 2 )}. It is the case that 
J>i liri2 (Si, E2) is bounded. To appreciate this, we note that cr 2 n2 > tr 2 (S 2 ) + 
4tr 2 (S|). Let 7 p = tr(Sf)/tr(S|) and k n = n x /(ni + n 2 ), then 



•^ni,n 2 ( S l> S 2) < 7= 0N ; 9/ ^ 9N ; = g= 9/ 2 ^ = — : Rnilp), 



(2/n 2 )tr(S 2 ) + (2/m)tr(S 2 ) 

yW^iTtrW + W^I)trW 
where /?„(«) = (x^-w+ l){w 2 + (j^) 2 }' 1 ^ 2 - Since i? n (u) is maximized 
uniquely at u* = (Jf-) 3 , iT ni , n2 (Si, S 2 ) < j-^- Thus, 

Oq\ R v-^^tb/^ gg . tr{(Si-S 2 ) 2 } 
(2.9) ^i )niiri2 (Si,S 2 ;a)>5> - 1 — 7- j—r + 



\ ^n(l 0~rii,ri2 

implying the power is bounded from below by the probability on the right- 
hand side. 



Both (2.8) and (2.9) indicate that SNRi(Si,S 2 ) =: tr{(S x - S 2 ) 2 }/cr ni 



"2 



is instrumental in determining the power of the test. We term SNRi(Si, S 2 ) 
as the signal-to-noise ratio for the current testing problem since tr{(Si — 
S 2 ) 2 } may be viewed as the signal while <7 nijn2 may be viewed as the level 
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of the noise. If the signal is strong or the noise is weak so that the signal- 
to-noise ratio diverges to the infinity, the power will converge to 1. If the 
signal-to-noise ratio diminishes to 0, the test will not be powerful and cannot 
distinguish Ho a from H\ a . We note that 

< lBa <4^tr(E?) + -!-tr(E|)' 



+ max{8 + 4Ai, 8 + 4A 2 }{ — tr(E?) + — tr(S|) 1 tr{(E x - E 2 ) 2 }. 

{m n 2 J 

Let 6 1>n = {i tr(Ef) + i tr(E2)}/ tr {(Si - E 2 ) 2 }, then 

SNRi(Ei,E 2 ) > [4^ n +max{8 + 4A 1 ,8 + 4A 2 }5 li „]- 1 / 2 . 

Thus, if the difference between Si and E 2 is not too small so that 

, v tr{(Ei — E 2 ) 2 } is at the same or a larger order 

(2 ' 10) of ^tr^ + ^tr^ 2 ), 

the test will be powerful. Condition (2.10) is trivially true for fixed-dimen- 
sional data while rij — > oo . For high-dimensional data, it is less automatic as 
tr(E 2 ) can diverge. To gain further insight on (2.10), let Aji < Aj 2 < • • • < Aj p 
be the eigenvalues of Ej. Then, a sufficient condition for the test to have 
a nontrivial power is tr{(E x - E 2 ) 2 } = 0{^£f =1 A 2 , + £ £*=l A^}. If all 
the eigenvalues of Ei and E 2 are bounded away from zero and infinity, (2.10) 
becomes tr{(Ei - E 2 ) 2 } = 0{n~ l p). Let Sp = p~ l vM(£i - E 2 ) 2 } be the 
average signal. Then the test has nontrivial power if Sr is at least at the 
order of ra" 1 / 2 ^" 1 / 2 , which is actually smaller than the conventional order 
of n -1 / 2 for fixed-dimension situations. This partially reflects the fact that 
high data dimensionality is not entirely a curse as there are more data infor- 
mation available as well. If the covariance matrix is believed to have certain 
structure, for instance banded or bandable in the sense of Bickel and Levina 
(2008a), we may modify the test statistic so that the comparison of the two 
covariance matrices is made in the "important regions" under the structure. 
The modification can be in the form of thresholding, a topic we would not 
elaborate in this paper; see Cai, Liu and Xia (2011) for research in this 
direction. 

3. Test for covariance between two sub- vectors. Let AQj = (X^ , xj? ) 

be a partition of the original data vector into sub-vectors of dimensions 

of pi and p2, and E,, i i 2 = Cov{X < ^\x\J > ) be the covariance between the 
sub-vectors. The focus in this section is to develop a test procedure for 
i?06:Ei i i 2 = E 2] i 2 . Testing for such a hypothesis is importance in its own 
right, for instance in detecting changes in correlation between two groups 
of genes under two treatment regimes. It can be also viewed as part of the 
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effort in reducing the dimensionality in testing high-dimensional variance- 
covariances. To elaborate on this, consider the partition of £j, 



(3.1) 



£i,n £i,i2 

S- 12 £i,22 



induced by the partition of the data vectors. Instead of testing on the whole 
matrices Si = £2, we can first test separately on the two diagonal blocks 
T,i t u = S 2j h for / = 1 and 2, by employing the test developed in the previ- 
ous section based on the sub-vectors of the two sample data respectively. 
Then, we can test for the off-diagonal blocks : £1,12 = £2,12 using a test 
procedure to be developed in this section. 

The partition of data vectors also induces a partition of the multivariate 
model in A3 so that 

(3-2) xU=rM Zij + ^ and X^=vfz ij+ ,f\ 

where I^ 1 ' is p\ x rrii and v\ 2 ^ is P2 x rrii such that = (T^' ,F^') and 

We are interested in testing Hob : £1,12 = £2,12 vs : £1,12 7^ £2,12- The 
test statistic is aimed at 

tr{(£i 12 — £212) (£1,12 — £2,12) } 

(3.3) 

= tr(£i ) i 2 £ 1) i 2 ) + tr(£2,12£ 2 ,12) ~ 2tr ( £ l,12£2,12) 5 

a discrepancy measure between £1,12 and £2,12- 

With the same considerations as those when we proposed the estimators 
in (2.1) and (2.2), we estimate tr(£/ li i2£' ft 12 ) by 

TT 1 y(l)' y(l) y(2)/ 

h ~ n h (n h - 1) ^ hi h i V hi 

(oa\ ? A (1), (1) (2)1 (2) 

(3 - 4) " n h (n h -l){n h -2) ^ 

. 1 Y (iy x (i) x (2)> Y (2) 

+ n,(n, " 1)K - 2)(n, - 3) .4?, w hj hi > 

and estimate tr(£i ; i 2 £ 2 12 ) by 

T/[/ _J_ y«' yM y(2)' y(2) 

nin2 ~^2"^ u 2 i 2 J " 



^ "(l)'v(l) V (2)'v( 2 ) 



; \ ' y w yVJ-; y w yv^ 

ni „ 2 ( ni - 1) 2^ A l* A 2, A 2i A lfc 

(3.5) 
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1 y(2)/ v (2) 



nm 2 (n 2 -l) w 

+ n 1 n 2 (n 1 -l)(n 2 -l) i ^ Alj ^ ' 

Both £7^ and W niri2 are linear combinations of U-statistics. 

Combining these estimators together leads to an unbiased estimator of 
tr{(£i,i2 — S 2i i 2 )(Si ; i 2 — S 2i i 2 )'}, 

(3.6) + 

which is also invariant under the location shift and orthogonal rotations. 

To establish the asymptotic normality of S nijn2 , we need an extra assump- 
tion regarding the off-diagonal sub- matrices. 

A4. As minjni, n 2 } — > oo, for any i, j, k and I G {1, 2}. 

(3.7) tr^uS^E^E^) = o{tr(S ii nS ii n) tr(E fci22 E J)22 )}. 

Derivations leading to (6.5) in Section 6 show that, under A2, A3 and A4, 
the leading order variance of S ni>n2 is 

2 

£ 



2 

W ni,ri2 



-^tr 2 (E lil2 E^ 12 ) + A tr ( S 2 ) tr(s? ) 



-I tr{(Sj i i 2 Il / ljl 2 — Sj^E^) 2 } 

(3.8) H tr{(Sj iiSi 12 — Sj i nS 2 i 2 )(Sj 22 S 1 12 — Ej 22 £ 2 12 )} 

rii 

+ ^tr{rp(E 1)12 - E 2 , 12 )rf orp(E lil2 - E 2 , 12 )rf } 

+ tr 2 (Ei : i 2 E 2 12 ) + tr(E 1,11 E 2 n) tr(Ei 22 E 2 22 ). 

nin 2 ' mn 2 

Similarly to the analysis on T n ^^ n2 in the previous section, the asymptotic 
normality of S niiU2 can be established in the following theorem. 

Theorem 3. Under Conditions Al— A4, as min{ni,n 2 } — >oo, 

[S ni ,n 2 — t r {(El,12 — S 2j l 2 )(El ! l 2 — E 2) i 2 )'] — > N(0, 1). 

Under H ob : E i) i 2 = E 2) i 2 = E i2 , say, w 2 ljn2 becomes 

2 2 



<4m,n 2 = 4- + —) tr 2 (S 12 S' 12 ) +2^ ^tr(E 2 n )tr(E 2 22 j 
\ni n 2 J t-f nf 

i=i 1 



(3.9) 

H tr(Ei nE 2 a) tr(Ei 22 E 2 22 ) 

nin 2 



TWO SAMPLE TESTS FOR HIGH-DIMENSIONAL COVARIANCE MATRICES 11 

In order to formulate a test procedure, ojq n2 needs to be estimated. An 
unbiased estimator of tr(£? „) for h = 1 or 2 and I = 1 or 2, is 

4(0 _ 1 V7 yW yW\2 ? A ( Z)/ m ay m 

n h (n h -l)(n h -2)(n h -S) ^ hi hj hk hl ' 
Similarly, an unbiased estimator of tr^i^/jX^h/i); for h = 1 or 2, is 

r (h) _ 1 ^/yW'yW^ 1 \ " yW'yW yW'yW 



1 Y {h)> Y (h) Y (h)i Y (h) 



nin 2 (n 2 - 1) .• 

+ ni na(ni - l)(n 2 - 1) . 2-, A i* A « A lfc a 2/ . 
Then under i/ b, an unbiased estimator of n 2 ^ s 
S 2 _ 2 f^I + ^V + A 4(1) 4(2) + A 4(1) 4(2) + (-(2) 
The following theorem shows that <^o niTl2 is ratio-consistent to ojQ mn2 . 



Theorem 4. Under Conditions A1-A4, and H^-.Y^i 



12 — ^2,12) 



J 0,n 1 ,n 2 P. 



, ,2 
w 0,ni,n 2 

Applying Theorems 3 and 4, we have, under Hq^, 

Sni,n 2 d 



^0,n\,n 2 



N(0,1). 



This suggests an a-level test that rejects if S ni ^ ri2 > iOQ ini ,n 2 z a - The 
power of the proposed test under : £1,12 7^ £2,12 is 

/32,m,n 2 (Si i i2, £2,12! «) = P[S nijn2 /U)o,ni,n2 > z a\Hlb)- 

From Theorems 3 and 4, the leading order power is 

& trj^Sj^lg — S2,12)(^l,12 — £2,12)'} 
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where 



,-.2 



{ n 2 ni J nf 

2 4 

+ tr(S| n ) tr(S| 22) H tr ( s i ,n S 2,il) tr(E x ,22^2,22)- 

Tiini 

Let r/p = tr(Ei ; i2S' 1 12 )/ tr(E2 i i2E 2 12 ). It may be shown that 



where R("f p ) is the same function defined in Section 2. Hence, asymptotically, 

/32,m,n 2 (Ei,i2, ^2,12; a) 



> / g^/l+ga^jg | tr{(S 1 , 12 -E2,i2)(Si, 1 2-E 2 ,i2) / } 
V k n (l — k n ) io ni ^ n2 

This implies that 

SNR 2 =: ^{(^1,12 - E 2 ,i2)(Ei,i2 - E 2 ,i2)'} Ki, n , 

is the key quantity that determines the power of the test. Furthermore, let 

s = (l/ni)tr(Si,ii)tr(Si, 22 ) + (l/n 2 ) tr(E 2 ,n) tr(E 2 , 22 ) 
tr{(Ei 5 i2 - E 2j i 2 )(Ei ) i2 - E 2 ,i2)'} 

It can be shown that 

(3.10) SNR 2 > [44„ + max{8 + 4A 1 ,8 + 4A 2 }(52,„r i/2 . 

Hence, the test is powerful if the difference between Ei^ and £2,12 is not too 
small so that tr{(Ei 5 i 2 - £2,12) (£1,12 - E 2 ,i2)'} is at the order of Ya=i ^7 x 
tr(Ej j n) tr(Ej 5 22) or larger. A further analysis on the power, similar to that 
given at the end of last section, can be made. Here for the sake of brevity, 
we will not report. 

4. Simulation studies. We report results from simulation experiments 
which were designed to evaluate the performance of the two proposed tests. 
A range of dimensionality and sample sizes was considered which allowed p 
to increase as the sample sizes were increased. This was designed to confirm 
the asymptotic results reported in the previous sections. 

We first considered the test for Ho a : Ei = E2 regarding the whole variance- 
covariance matrices. To compare with the conventional likelihood ratio (LR) 
test and the corrected LR test proposed by Bai et al. (2009), we first con- 
sidered cases of p < min{ni,n2} and the normally distributed data. Specif- 
ically, to create the null hypothesis, we simulated both samples from the 
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Table 1 

Empirical sizes and powers of the conventional likelihood ratio (LR), the corrected 
likelihood ratio (CLR) and the proposed tests (Proposed) for the variance-covariance, 
based on 1000 replications with normally distributed {Zijk} 



Power 



(p,n 1 ,n 2 ) 


Methods 


Size 


6>i = 0.5 


6»i = 0.3 


6>i = 0.2 


(40,60,60) 


LRT 


1 


1 


1 


1 




CLRT 


0.043 


0.999 


0.509 


0.172 




Proposed 


0.052 


0.999 


0.734 


0.271 


(80,120,120) 


LRT 


1 


1 


1 


1 




CLRT 


0.045 


1 


0.946 


0.421 




Proposed 


0.053 


1 


0.997 


0.713 


(120,180,180) 


LRT 


1 


1 


1 


1 




CLRT 


0.062 


1 


1 


0.713 




Proposed 


0.045 


1 


1 


0.958 



p-dimensional standard normal distribution. To evaluate the power of the 
three tests, we set the first population to be the p-dimensional standard 
normally distributed while simulating the second population according to 

(4.1) Xijk = Z^k + OiZijk+i, 

where {Zijk} were i.i.d. standard normally distributed, and 0\ = 0.5,0.3 and 
0.2, respectively. As 9\ was decreased, the signal strength for the test became 
weaker. We chose (p, m, n 2 ) = (40, 60, 60), (80, 120, 120) and (120,180,180), 
respectively. The empirical size and power for the three tests are reported 
in Table 1. All the simulation results reported in this section were based on 
1000 simulations with the nominal significance level to be 5%. 

We then carried out simulations for situations where p was much larger 
than the sample sizes. In this case, only the proposed test was considered, 
as both the LR and the corrected LR tests were no longer applicable. We 
chose a set of data dimensions from 32 to 700, while the sample sizes ranged 
from 20 to 100, respectively. We considered the moving average model (4.1) 
with 6\ = 2 as the null model of both populations for size evaluation. To 
assess the power performance, the first population was generated according 
to (4.1), while the second was from 

(4.2) X^k = Z^k + diZijk+i + 02Zyk+2, 

where 0\ = 2 and di = \. Three combinations of distributions were experi- 
mented for the i.i.d. sequences {Zijk} p k=1 in models (4.1) and (4.2), respec- 
tively. They were: (i) both sequences were the standard normal; (ii) the cen- 
tralized Gamma(4,0.5) for Sample 1 and the centralized Gamma(0.5, y/2) 
for Sample 2; (iii) the standard normal for Sample 1 and the centralized 
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Table 2 

Empirical sizes and powers of the proposed test for the variance-covariance matrices, 
based on 1000 replications with normally distributed {Zijk} in Models (4-1) and (4-2) 



P 



ni = n 2 


32 


64 


128 


256 


512 


700 








Sizes 








20 


0.044 


0.054 


0.051 


0.048 


0.051 


0.038 


50 


0.052 


0.060 


0.033 


0.043 


0.054 


0.049 


80 


0.054 


0.060 


0.047 


0.048 


0.052 


0.053 


100 


0.056 


0.049 


0.052 


0.046 


0.049 


0.048 








Powers 








20 


0.291 


0.256 


0.267 


0.277 


0.282 


0.291 


50 


0.746 


0.821 


0.830 


0.837 


0.832 


0.849 


80 


0.957 


0.992 


0.991 


0.998 


0.999 


0.998 


100 


0.994 


1 


0.999 


1 


1 


1 



Gamma(0.5, v2) for Sample 2. The last two combinations were designed to 
assess the performance under nonnormality. The empirical size and power 
of the test are reported in Tables 2-4. 

We observed from Table 1 that the size of the conventional LR test was 
grossly distorted, confirming its breakdown under even mild dimensionality, 
discovered in Bai et al. (2009). The severely distorted size for the LR test 
made its power artificially high. Both the corrected LR test and the proposed 
test had quite accurate size approximation to the nominal 5% level for all 

Table 3 

Empirical sizes and powers of the proposed test for the variance-covariance matrices, 
based on 1000 replications with Gamma distributed {Zijk} in Models (4-1) and (4-%) 



P 



ni = n.2 


32 


64 


128 


256 


512 


700 








Sizes 








20 


0.119 


0.117 


0.069 


0.063 


0.051 


0.040 


50 


0.150 


0.110 


0.094 


0.052 


0.053 


0.051 


80 


0.155 


0.111 


0.093 


0.067 


0.064 


0.044 


100 


0.148 


0.120 


0.084 


0.056 


0.058 


0.053 








Powers 








20 


0.299 


0.282 


0.290 


0.309 


0.265 


0.277 


50 


0.574 


0.665 


0.693 


0.750 


0.801 


0.828 


80 


0.804 


0.886 


0.942 


0.968 


0.991 


0.986 


100 


0.899 


0.945 


0.986 


0.995 


0.998 


1 
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Table 4 

Empirical sizes and powers of the proposed test for the variance-covariance matrices, 
based on 1000 replications with the mixed normal and Gamma distributions for {Zijk} in 

Models (4.1) and (4.2) 



P 



ni = n 2 


32 


64 


128 


256 


512 


700 








Sizes 








20 


0.108 


0.099 


0.076 


0.059 


0.070 


0.050 


50 


0.117 


0.111 


0.069 


0.068 


0.057 


0.053 


SO 


0.124 


0.099 


0.091 


0.065 


0.064 


0.060 


100 


0.150 


0.122 


0.085 


0.069 


0.056 


0.047 








Powers 








20 


0.256 


0.296 


0.278 


0.297 


0.276 


0.295 


50 


0.606 


0.659 


0.724 


0.766 


0.824 


0.823 


SO 


0.805 


0.890 


0.950 


0.977 


0.989 


0.992 


100 


0.904 


0.958 


0.982 


0.996 


0.999 


1 



cases in Table 1. Both tests enjoyed perfect power at 9\ = 0.5, when the signal 
strength of the tests was strong. When the value of #2 decreased, both tests 
had smaller power, although the proposed test was slightly more powerful 
than the corrected LR test at 9\ = 0.3 and much more so at 9± = 0.2, when 
the signal strength was weaker. 

The simulation results for the proposed test with dimensions much larger 
than the sample sizes and for nonnormally distributed data are reported in 
Tables 2-4. We note that the LR tests are not applicable for the setting. 
The simulation results show that the proposed test had quite accurate and 
robust size approximation in a quite wider range of dimensionality and dis- 
tributions, considered in the simulation experiments. The tables also show 
that the power of the proposed tests was quite satisfactory and was increased 
as the dimension and the sample sizes became larger. 

We then conducted simulations to evaluate the performance of the second 
test for .Hot : £1,12 = £2,12- ^ e Petition equally the entire random vector 
into two subvectors of p\ = p/2 and P2 = P — Pi- To ensure sufficient number 
of nonzero elements in the off-diagonal sub-matrices £1,12 an d £2,12 when the 
dimension was increased, we considered a moving average model of order mi, 
which is much larger than the orders used in (4.1) and (4.2). In the size 
evaluation, 

(4.3) Xijk = Z^k + c-iZijk+i H 1- "mi ^ijk+mi 

for i = 1, 2, j = 1, . . . , rtj, where all the on coefficients were chosen to be 0.1. 
In the simulation for the power, we generated the first sample according to 
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Table 5 

Empirical sizes and powers of the proposed test for the covariance between two 
sub-vectors, based on 1000 replications for normally distributed {Zijk} in Models (4-3) 

and (4-4) 



P 



ni = n 2 


50 


100 


200 


500 


700 






Sizes 








20 


0.069 


0.071 


0.070 


0.065 


0.077 


50 


0.064 


0.056 


0.064 


0.063 


0.055 


SO 


0.057 


0.046 


0.056 


0.073 


0.052 


100 


0.047 


0.062 


0.055 


0.054 


0.048 






Powers 








20 


0.639 


0.625 


0.628 


0.620 


0.615 


50 


0.993 


0.994 


0.982 


0.983 


0.989 


SO 


1 


1 


1 


1 


1 


100 


1 


1 


1 


1 


1 



the above (4.3) and the second from 

(4.4) Xijk = + PlZijk+l H 1" Ana 

for j = 1,. . . ,ri2, where the (3{ were chosen to be 0.8. We chose the lengths 
of the moving average m\ and rri2 according to the dimension p such that 
as p was increased, the values of mi and m 2 were increased as well. Specif- 
ically, we set (mi, m 2 ,p) = (2, 25, 50), (3, 50, 100), (7, 100, 200), (12, 250, 500) 
and (18,300,700), respectively. Two distributions were considered for the 
i.i.d. sequences {Zijk} p k=l in (4.3) and (4.4): (i) both sequences were stan- 
dard normally distributed; (ii) the centralized Gamma(4, 0.5) for Sample 1 
and the centralized Gamma(0.5, y/2) for Sample 2. The simulation results 
for the second test are reported in Table 5 for the normally distributed case 
and Table 6 for the Gamma distributed case. 

We observed from Table 5 that the empirical sizes of the proposed test 
converged to the nominal 5% quite rapidly, while the powers were quite 
high and quickly increased to 1. For the Gamma distributed case reported 
in Table 6, the convergence of the empirical sizes to the nominal level was 
slower than the normally distributed case indicating that the convergence of 
the asymptotic normality depends on the underlying distribution, as well as 
the sample size and dimensionality. The powers in Table 6 were reasonable, 
although they were smaller than the corresponding normally distributed case 
in Table 5. Nevertheless, the power was quite responsive to the increase of p 
and the sample sizes. 

5. An empirical study. We report an empirical study on a leukemia data 
by applying the proposed tests on the variance-covariance matrices. The da- 



TWO SAMPLE TESTS FOR HIGH-DIMENSIONAL COVARIANCE MATRICES 17 

Table 6 

Empirical sizes and powers of the proposed test for the covariances between two 
sub-vectors, based on 1000 replications with Gamma distributed {Zijk} in Models (4-3) 

and (4-4) 



P 



ni = n 2 


50 


100 


200 


500 


700 






Sizes 








20 


0.105 


0.092 


0.085 


0.082 


0.082 


50 


0.101 


0.090 


0.081 


0.088 


0.090 


SO 


0.107 


0.094 


0.083 


0.078 


0.065 


100 


0.093 


0.083 


0.093 


0.059 


0.071 






Powers 








20 


0.499 


0.501 


0.519 


0.482 


0.502 


50 


0.775 


0.802 


0.783 


0.754 


0.777 


SO 


0.945 


0.923 


0.921 


0.922 


0.923 


100 


0.974 


0.957 


0.969 


0.964 


0.960 



ta [Chiaretti et al. (2004)] , available from http : / / www . bioconductor . org/, 
consist of microarray expressions of 128 patients with either T-cell or B- 
cell acute lymphoblastic leukemia (ALL); see Dudoit, Keles and van der 
Laan (2008) and Chen and Qin (2010) for analysis on the same dataset. We 
considered a subset of the ALL data of 79 patients with the B-cell ALL. 
We were interested in two types of the B-cell tumors: BCR/ABL, one of the 
most frequent cytogenetic abnormalities in human leukemia, and NEG, the 
cytogenetically normal B-cell ALL. The number of patients with BCR/ABL 
was 37 and that with NEG was 42. 

A major motivation for developing the proposed test procedures for high- 
dimensional variance-covariance matrices comes from the need to identify 
sets of genes which are significantly different with respect to two treatments 
in genetic research; see Barry, Nobel and Wright (2005), Efron and Tibshrini 
(2007), Newton et al. (2007) and Nettleton, Recknor and Reecy (2008) for 
comprehensive discussions. Biologically speaking, each gene does not func- 
tion individually, but rather tends to work with others to achieve certain 
biological tasks. Gene-sets are technically defined vocabularies which pro- 
duce names of gene-sets (also called GO terms). There are three categories 
of Gene ontologies of interest: Biological Processes (BP), Cellular Compo- 
nents (CC) and Molecular Functions (MF). For the ALL data, a preliminary 
screening with gene-filtering left a total number of 2391 genes for analysis 
with 1599 unique GO terms in BP category, 290 in CC and 357 in MF. 

Let us denote S\,...,S q for q gene-sets, where S g consists of p g genes. 
Let F\s g and F2s g be the distribution functions corresponding to S g under 
the treatment and control, and fiis g and fi2S g be their respective means, 
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BP BP 




P-values L_n 



Fig. 1. Histograms of p-values (left panels) for testing two covariance matrices and test 
statistic L n (right panels) for the three gene- categories. 

and Si5„ and £>2S g be their respective variance-covariance matrices. Our 
first hypotheses of interest are Hq 9 : Xis a = ^>2S g for g = 1, . . . ,q regarding 
the variance-covariance matrices. For the second hypothesis, we divided each 
gene-set into two sub- vectors by selecting the first [p/2] dimensions of the 
gene-set as the first segment and the rest as the second. 

We first applied the proposed test for the equality of the entire variance- 
covariance matrices and obtained the p- value for each gene-set. The p-values 
and the values of the test statistics L n as given in (2.7) are displayed in 
Figure 1 for the three gene-categories. By controlling the false discovery rate 
[FDR, Benjamini and Hochberg (1995)] at 0.05, 338 GO terms were declared 
significant in the BP category, 77 in the CC and 75 in the MF, indicating 
that the dependence structure among the gene-sets was significantly different 
between the BCR/ABL and the NEG ALL patients for a large number 
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Table 7 

Number of GO terms which were tested significantly different at the diagonal blocks, 
off-diagonal blocks and both diagonal and off-diagonal blocks, respectively 





Diagonal only 


Off- diagonal only 


Both 


Total 


BP 


115 


17 


206 


338 


CC 


26 


1 


50 


77 


MF 


22 





53 


75 



of gene sets. That a relatively large number of gene-sets being declared 
significant by the proposed test was not entirely surprising, as we observe 
from Figure 1 that there were very large number of p- values which were very 
close to 0. 

For those GO terms which had been declared having different variance- 
covariance matrices, we carried out a follow-up analysis trying to gain more 
details on the differences by partitioning the variance-covariance into four 
blocks in the form of (3.1) with p\ = \p/2] and P2 = P~ Pi- We want to know 
if the difference was caused by the diagonal blocks or the off-diagonal blocks. 
The tests on the two diagonal blocks were conducted using the first proposed 
test for the variance-covariance matrix but with p\ or p2 dimensions, respec- 
tively. The tests on the off-diagonal blocks were conducted by employing the 
second proposed test for covariances between the two sub-vectors. The re- 
sults are summarized in Table 7, which provides the numbers of gene-sets 
which were tested significant in the diagonal matrices only, the off-diagonal 
matrix only, and both at 5%. There were far more gene-sets which had both 
diagonal and off-diagonal matrices being significantly different, and it was 
less likely that the off-diagonal matrices were different while the diagonal 
matrices were otherwise. It was a little surprising to see that the numbers 
of significant gene-sets for the diagonally-only, off-diagonal only and both in 
each functional category added up to the total numbers exactly for all three 
gene-categories. 

As we have stated in the Introduction, the proposed tests are part of the 
effort in testing for high-dimensional distributions between two treatments. 
However, directly testing on the distribution functions is quite challenging 
due to the high dimensionality as such tests may endure low power. A realis- 
tic and intuitive way is to test for simpler characteristics of the distributions, 
for instance testing for the means as in Bai and Saranadasa (1996) and Chen 
and Qin (2010), and the variance-covariance as considered in this paper. For 
the ALL data, in addition to testing for the variance-covariance, we also car- 
ried out tests for the means proposed in Chen and Qin (2010) at a level of 
5%. Table 8 contains two by two classifications on the number and the prob- 
ability of gene-sets which are rejected/not rejected by the tests for the mean 
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Table 8 

Two by two classifications on the number (probability) of GO-terms rejected/not rejected 
by the tests for the means and the variances for the three functional categories, 

respectively 



Mean test 



Variance test 


Rejected 


Not rejected 




(a) Biological Processes (BP) 




Rejected 


314 (0.196) 


22 (0.015) 


Not rejected 


1000 (0.625) 


263 (0.164) 




(b) Cellular Components (CC) 




Rejected 


77 (0.266) 


4 (0.014) 


Not rejected 


164 (0.566) 


45 (0.154) 




(c) Molecular Functions (MF) 




Rejected 


86 (0.241) 


1 (0.003) 


Not rejected 


203 (0.568) 


67 (0.188) 



and the variance respectively. It is observed that it is far more likely for the 
means to be significantly different than the variance-covariance, with the 
probability of rejection being around 0.8 for the means versus 0.2 to 0.3 for 
the covariance for the three functional categories. Given a gene-set which 
was not tested significant for the means, the conditional probability of being 
tested significant for the covariance is lower than that given a gene-set was 
not tested significant for the means. These were confirmed by conducting 
the chi-square test for association for the three gene-set categories, which 
rejected overwhelmingly (with p-values all less than 0.0005) the hypothesis 
of no-association between being tested significant for the mean and the vari- 
ance. For this particular dataset, the tests for the means were quite effective 
in disclosing most of the differentially expressed gene-sets. However, we do 
see that for Biological Processes and Cellular Component categories, among 
those whose means were not declared significantly different, there were about 
10% of gene-sets having significant different covariance structures. 

6. Technical details. As both T ni „ 2 and S nit n 2 are invariant under the 
location transformation, we assume Hi=0 throughout this section. 

6.1. Derivations of Var(T ni!n2 ) and Var(S' nijn2 ). Recall that T nij „ 2 = 
A ni + A n2 — 2C nin2 . It is straightforward to show that E(T„ 1]n2 ) = tr{(£i — 
E 2 ) 2 }. By noticing that Cov(A ni , A n2 ) = 0, 

Var(T nijn2 ) = Var(^ ni ) + Var(^ n2 ) + 4 Var(C nin2 ) 

(6.1) 

- 4 Cov(^ ni , C nin2 ) - 4 Cov(A n2 , C nin2 ) . 
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Adopting results from Chen, Zhang and Zhong (2010), for h = 1 or 2, 
(6.2) 



Var(A>J = ^tr\Z 2 h ) + — tr(E£) + ^tr^^ o 



rif. 



n h 



h0^tr 2 (S|) + -^tr(St 



n 



h 



nr 



Furthermore, we obtain 

Var(C nin2 ) = — tr^EiEa) + ( — + — ) tr^EsE^ 
nin 2 \ n i n 2/ 



+ —tr(T / 1 r 2 T' 2 T 1 o rirar'aro 



(6.3) 



ni 



+ — tr(r 2 r!rir 2 o r^r^r,) + o{ — tr 2 (E!S 2 ) 1 

ra 2 [nm 2 J 

— + f -p= + - ]\ Var(C nin2 , 



I_L_ 

[ V n l n 2 



By carrying out similar procedures, we are able to obtain Cov(A ni , C niri2 ) 
and Cov (A n2 ,C niri2 ). After we substitute all the results into (6.1), 

2 



Var(T n 



i=l 



- tr 2 (s 2 ) + - tr(E?) + ^ tr^r^r, o r^rjr. 



J?-; 



?1; 



~ 2 -^ ^tr^E^or^r, 



tr(E^EiE 2 ) 



?1; 



+ tr 2 (EiE 2 ) + — + — tr(EiS 2 EiE 2 ) 

nin 2 \ n i n 2/ 

4Ai 



(6.4) 



+ ^. t r(rir 2 r' 2 ri o rir 2 r' 2 ri) 
+ l_a tr^ririra o r^r;^) 

n 2 



+ o 



+ o 



1 



■tr 2 (E!E 2 ) 



{ V n i n 2 ^in 2 ^VV^i n i/J 
2 

+ E{ J 2^) + J 3tr 2 (S 2 )) 
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Similarly to T„ 1>n2 , we have E(5 m , n2 ) = tr{(£i jl2 - X 2 ,i 2 )(Xi,i 2 - £2,12)'} 
and the leading order terms in Var(S' nin2 ) are given by 



Var(S" nin2 ) = J2 



2 . o ,„ „/ , 2 



-2 tr 2 (S M2 S' 2 ) + ^tr(£ 2 )tr(£ 2 



(6.5) 



H tr{(Si i i2S' ljl2 - Sj,i 2 S 2il2 ) 2 } 

H tr{(Sj i nSi i i 2 — Sj i nS 2i i 2 )(Ilj j22 S 1 12 — Sj i22 S 2il2 )} 

4A,- 

+ 



m 

xtr{r^'(s 1)12 -E 2!l2 )r 

H — tr 2 (Si i 2 S 2 12 ) H — tr(Si nS 2 ii)tr(Si 22 S 2 22 ). 

nin 2 ' nin 2 

6.2. Proof of Theorem 1. The leading order terms in Var(T niin2 ) are 
contributed by A nhi i for h = 1,2 and C nin2 ,i> which are defined by 

A - h ,i = nh(n ^_ 1) E( x ^) 2 ' <w = ^E^^) 2 . 

Hence, we only need to study the asymptotic normality of Z ni ^ n2 which is 
defined by Z ni>ri2 =: A ni)1 + A m>x - 2C nin2i i. 

In order to construct a martingale sequence, it is convenient to have new 
random variables Yj which are defined as 

Yi = Xu for i = l,2,...,m, 

Y ni+j = X 2j for j = 1,2,..., n 2 . 

To construct a martingale difference, we let j^o = {0,0}, ^ = cr{Yi, . . . , 
Y/t} with /c = 1,2, ...,m + n 2 . And let E&(-) denote the conditional expec- 
tation given j^.. Define D n ^ = (E^ — Efc_i)Z nijn2 and it is easy to see that 

Zni,ri2 ~ E(^ nii n 2 ) = Xyfe="l 2 ^n,k- 

Lemma 1. For any ?i, {Z) n fc,l < k < n} is a martingale difference se- 
quence with respect to the a -fields {J^fc, 1 < k < n}. 

Proof. First of all, it is straightforward to show that ED n j. = 0. Next, 
by denoting S n , m = YJk=i D n,k = E m Z nun2 - EZ nun2 , we have S n>q = S n , m + 
(E q Z ni ^ n2 -E m Z nitn2 ). Then we can show that E(S nA \& m ) = 5 n>m . This 
completes the proof of Lemma 1. □ 
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To apply martingale central limit theorem, we need Lemmas 2 and 3. 
Lemma 2. Under Condition A2 and as min{ni,ri2} — >oo, 

,k p 



Eni+n2 _2 
k=l °n,, 



Var(Z niri2 ) 

where (7 lk = E k-i(D 2 



1, 



Proof. To prove Lemma 2, first we can show ^{J2k=\ n2 a n k) = 
Var(Z ni>n2 ). Then we will show that as minjni, 712} — > 00, Var(^^^ n2 a\ k )/ 

Var 2 (Z nii „ 2 ) — > 0. To this end, we decompose ^fcli" 2 a n k * ne sum °f 
eight parts, 

ni+712 

^ ] &n,k = ^1 + ^2 + R3 + ^4 + + -^6 + -R7 + ^8j 
fc=l 

where with Q l>k ^ = Eti(^^/ - Si) and Q 2 , ni +l-i = Eti(^ 1+ i^ 1+i - 
£2), 

m g 

^ = E n 2/ n _^2 tr (Ql,*-l S lQl,fc-l E l) 
"2 g 

+ E "27 Tv? ^ (Q 2, n^l-l^Q 2,^+1-1^2), 

^ n^(n 2 - l) 2 
ni 1fi fc— 1 

^ = E 2, n E^'( s ? - ^El)^}, 



^3 = E 



16 



^ n^(re 2 - 1) 



tr(Q 2> ni+i-iEi) - tr|Q2 irei+ i-iS 2 ^f>*i j Saj 



i? 4 = -^-Etr(y i y;s 2 y i y/E 2 ) - — tris|( E W) 1, 

ni 4A 

R 5 = y~] "27 TT? ^(riQi fc _iri o riQi fe _iri) 

^ nf (m - l) 2 

v 2 - 4A 2 

+ E ~~ 2? 7\9 tr ( r 2 < 32,ni+i-ir2 ^^ni+Z-lL^), 

711 8A 

= E 2/ 1 T\ tr {r'i(Si - s 2 )ri o riQi^iTi}, 
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*7 = E 



~ n 2 2 (n 2 - 1) 

-tr< r 

and 



tr(r' 2 (52,ni+«-i r 2 ° r 2 E 2 r 2 



2 Q 2 ,n 1+ i-ir 2 o r 2 (±- YiYn r 2 1 



4A ni 8A ni 

= Vtr(r / 2 y i r/r 2 o r^y/r,) 2 - Vtr(r 2 E 2 r 2 o r'^y/r,). 

Therefore, we need to show that Vax(Ri) = o{Var 2 (Z nij „ 2 )} for i = 1, . . . , 8. 
For R±, there exists a constant i^i such that 

Var(-Ri) < ifi{n^ 4 tr 2 (£ 2 ) tr(Sf ) + n^ 4 tr 2 (S|) tr(S|)}. 

Then, applying Var 2 (Z nii „ 2 ) > g tr 4 (Sf ) + ±| tr 4 (S^) from (2.5), we know 

Vax(i?i) < /^i f tr(£ 4 ) | tr(S 4 ) 



Var 2 (Z ni , n2 ) " 16\tr 2 (S 2 ) tr 2 (S 2 ^' 

where tr(S 4 )/tr 2 (Sf ) -)• under Condition A2. Thus, Var(i?i) = 
o{Var 2 (Z nii „ 2 )}. 

By carrying out similar procedures we can show that the above is true 
for Ri with i = 1, . . . , 8. Hence we complete the proof of Lemma 2. □ 



Lemma 3. Under Condition A2, as minjni, n 2 } — > oo 



Efcit n2 Ep 4 



Var 2 (Z n , lin2 ) 

Proof. For the case of 1 < A; < m, there exists a constant c such that 

ni 

E E «J < cK 3 tr 2 {(S 2 - SxS,) 2 } +n r 5 tr 4 {(S 2 )}]. 
k=l 

Using the results Var 2 (Z nii „ 2 ) > 64nf 2 tr 2 {(£ 2 - £i£ 2 ) 2 } and 
Var 2 (Z nii „ 2 ) > 16nj~ 4 tr 4 {(£ 2 )} from (2.5) and as n\ — > oo, we have 

Y2U E ( D t,k) . c 

97 V < ► 0- 

V&r z {Z nun2 ) m 

For the case of n\ < k < n\ + n 2 , there exists a constant tf such that 
ni+H 2 i 

E < tr 4 (S!S 2 ) + tr^CEiEa) tr 2 (S 2 )} 

win- 



k=n\ 



l'"2 
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(6.6) + -^ 1 [2tr 2 (E 1 S 2 )tr{(E| - S^) 2 }] + 4tr 4 {(E 2 )} 

+ 4[2tr 2 (£ 2 )tr{(£ 2 - ^if} + 4tr 2 (E!E 2 ) tr 2 (E 2 )]. 

To evaluate the ratio of individual term in (6.6) to Var 2 (Z niin2 ), respec- 
tively, we simply replace Var 2 (Z niin2 ) by corresponding terms in (2.5). Then 
under Condition A2 and as n 2 -I 00, ESli E ( Z? n,fc)/ Var 2 (Z nii „ 2 ) -»• 0. 
Therefore, we complete the proof of Lemma 3. □ 

With two sufficient conditions given in Lemmas 2 and 3, we conclude that 

Var(Z rei , n2 ) ^^ i >- 

If we let e ni! n 2 = Aii,2 + Aii,3 + ^-n 2 ,2 + Ai 2 ,3 — 2C mnii 2 — 2C„ ini> 3 — 
2C ni m,4, then T ni , n2 = Z niin2 + e„ l5 „ 2 . From Var(e nij „ 2 ) = o(o- 2 in2 ), 



\ cr ni,n 2 / 



Var(£ nijn2< 

v, 



Var ( ) = -> 0. 

°Yii,ri 2 



Moreover, E(e nijTl2 ) = 0. Therefore, e nijn2 /a nijTl2 — > 0. From Slutsky's The- 
orem, we complete the proof of Theorem 1. 

6.3. Proof of Theorem 2. Recall that E(A n J = tr(Ej*) for /t = 1 or 2. To 

show A Uh / ^(E 2 ,) — >■ 1, it is sufficient to show that Y&v{A Uh / tr(E|)} — > 0. 
From (6.2), we have 

Var ' 



tr(S 2 ) 



< 



tr 2 (S 2 ) 



± tr 2 (E 2 ) + tr(E|) + o\ -I tr 2 (E 2 ) + -1 tr(E 4 ) 



where tr(E^)/tr 2 (E 2 ) -> under Condition A2. Hence, A„ h /tr(E 2 ) 4 1. 

Moreover, under i^oa : Si = E2 = S, A n;j /tr(E 2 ) 4 1. Then using the con- 
tinuous mapping theorem, we have <5"o,ni,n 2 /Vo,ni,n 2 4 1. 

6.4. Proof of Theorem 3. The leading order terms in Yai(S ni! n 2 ) are 
contributed by U nfit i and W nin2j i which are defined by 

77 1 y(l)/ y(l) y(2)/ y(2) 

nh)1 " n h (n h - 1) ^ u h i h i hi ' 



w - 1 STy^'y^y^'y^ 



nin 2 
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From Slutsky's Theorem, we only need to study the asymptotic normality 
of H nun2 which is defined as H nuri2 =: U nu i + U n2j \ - 2W ni „ 2! i. 

To implement martingale central limit theorem to H nin2 , we need a mar- 
tingale sequence. To this end, we define random variables which are 

y} 1] =X^ and y/ 2) =x(f for i = l,2,...,m, 

Y Sl j = 4f and Y® +j = X® for j = l,2,...,n 2 . 

If we define C n ^ = (E^ — Ftk-i)H niina , where E^(-) denote the conditional 
expectation given = a{Yi, . . . , Y&} with k = 1, 2, . . . ,m + n2, we claim 
that {C n fc,l < A; < n} is a martingale difference sequence with respect to 
the (7-fields <k<n} from Lemma 1. We need Lemmas 4 and 5 to 

implement the martingale central limit theorem. 

Lemma 4. Under Conditions A2 and A4, as min{ni,?i2} - >-oo, 

E«i+n 2 2 
fe=l 'n,fc j 

Var(i7 nii „ 2 ) 

where <fc = E fc-i(Cn,fc)- 

Proof. First, we can show that E(X)£i{ n2 r^ k ) = Vav(H nuri2 ). There- 
fore, we only need to show Var(^)!i^ n2 r 2 fc ) = o{Var 2 (i/ nijn2 )} to complete 

the proof of Lemma 4. To this end, we decompose Ylk=i n2 T n k ^ n ^° t we l ye 
parts, 

ni+ri2 

<fc = P± + P 2 + P 3 + Pi + P 5 + P 6 + P 7 + P 8 + P9 + PlO + Pll + Pl2, 

k=l 

where with 

fc-i 

1 , k _ 1 = J2(Yl 1) Yi {2) '-^i,i2) and 
t=i 

2 , ni +l-l = ^2( Y nl+i Y m-li ~ ^2,12), 
i=l 

™ 4 

Pl = n 2 (m - 1) 2 tr ( 0l .*- lS i.i2°i.*-i E i.i2) 

n 2 4 

+ ~~27 T\2 tr (°2,ni+i-lS 2i i202,n 1 +«-l S 2,12)) 
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ni . 



P<2 ~ E "2? T\2 tr (°l,fe-l S l,220i ,11; 



n 2 4 



+ E ~~27 1^2 tr (°2,n 1 +«"l S 2,2202 n + i_iS2,ll), 

^ 3 = E ~( TT tr {°l,fc-l S 'l,12( S l,12 - S 2l 12)S' 1 12 }, 

^ nf (m - 1) 
"1 g 

A = >J -37 -rtr{Oi fc _iEi 22(^1 12 - S 2 12 )Si u}, 

1 i=l / 
1 ni \ 

— Vy (1 V (2) ' I 

1 ni \ 
— Vr. (1) y. (2) ' i 

1 1 ) 



2,12 



-ail 



= ^ tr H^ 2 -^r>^ (1) ^ ) ') s - 



xiE 2 ,i2--vy/ 1 V/ 2 > e 2)12 , 



p 8 = ^tr| I E 2 ,i2- 



x I ^2,12 



ni 4Ai 



I \ ^ ^2 i../ r (l)/ n r (2). r (l)' n r (2)x 

f 2^ ra 2 ( ^ n2 _ 1 - ) 2 tr i i 2 ( ^2,n 1 +i-ll 2 oi 2 U 2,ni+l-^ 2 J) 



ni 8A1 



8A 
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tJif > U, 12 - g j rf) o r«'o 2ini+ ,_ ir ( : 



«l V (1) V (2)/ , 

For Pi, there exists a constant J\ such that 

2 j 

Var(Pi) < V] -^{tr 2 (S/j i 2 S^ 12 ) tr(E ft nE^ ^E^ 22 E fe 12 ) 
+ tr(E 2 ill )tr(E 2 |22 )tr(E /l 

+ tl ,2 (Eft i iiE/ l) i2S/ l) 22E^ 12 )}. 

Using Var 2 (# ni , n2 ) > ^tr(E 2 )tr(S 2 2 )tr 2 (S /l , 12 S' h 2 ) from (3.8), 

h 

(J 1 /K))tr 2 (E ftil2 S' /ll2 )tr(S, ,11^^,12^^,22 El 12 ) 
Var 2 (F ni , n2 ) 
<^i tr(E/ l!ll E/ l]12 E/ l] 22E / ft 12 ) 



2 



< 



8tr(E 2 )tr(E 2 



22' 



which goes to zero under Condition A4 for h = 1 or 2. 

Similarly, using Var 2 (# ni , n2 ) > 4tr 2 (S 2 n )tr 2 (S 2 22 ) from (3.8), 

h 

■^tr 2 (E ft ii Eft i 2 E/, j2 2E' h i 2 )/Var 2 (F n n2 ) ->■ 0, and 
4 tr(E 2 n ) tr(E 2 j22 ) tr(£ M iE M2 E hi22 E , h ^J/ Var 2 (F nii „ 2 ) -> 0. 

Hence, Var(Pi) = o{Vai 2 (H ni ^ n2 )}. Similarly, we have Var(p) = 
o{Var 2 (// niin2 )} for i = 1, . . . , 12. Therefore, we complete the proof of Lem- 
ma 4. □ 

Lemma 5. Under Conditions A2 and A4, as min{ni,ri2} — >• oo 



Var 2 (F nijn2 ) 



0. 
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Proof. For the case of 1 < k < m, there exists a constant c such that 



»i 



J>(c£ ifc ) < ^^{EluCSlw - s 2i12 )e 1|22 (s' 1i12 - s' 2)12 )} 
fe=i 

+ nr 5 tr 2 (S? ill )tr 2 (S? i22 )]. 

Applying Var 2 (fl"„ lina ) > 16% 2 tr 2 {Ei,ii (21,12 - S 2 ,i 2 )S lj22 (Si 12 - 2' 2 12 )} 
and Var 2 (# ni , n2 ) > 4n 1 " 4 tr 2 (2 

1 11) tr 2 (E 2 22 ) from (3.8) and as n\ — > 00, 

YJk=l E ( C n,k) < _£_ _^ q 

Var 2 (# ni ,„ 2 ) ~ ni 
For the case of ni < k < n\ + 712, we can find a constant d such that 

ni+ri2 



k=n\ 



d 



< o q tr(Si nS 2 , 11) tr(Si 22^2,22) 



IV n 



1"2 



(6.7) 



H qtr 2 {(S2 l llIl2 l 12 — S 2 nSi ,12)(22,222 2 12 — 22,222'! 12 )} 

d 



1 ? tr(2i ,n2 2 ,ii) tr(Ei 22 E 2 22 ) 

71l?l 2 

x tr{2 2 , 11(22, 12 — 2i,i2)22,22(2' 2j i 2 — 2 112 )} 

+ ^3 tr 2 (2 1 ,i 1 2 2 ,n) tr 2 (2 1 , 22 2 2 ,22) + 4 tr '( S 2 11) *r 2 (2 2 22 ). 
n|n 2 n 2 

To evaluate the ratio of individual term in (6.7) to Var 2 (i? ni ,„ 2 ), respec- 
tively, we simply replace Var 2 (i? ni ,„ 2 ) by corresponding terms in (3.8). Then 
we can show that ES"+iE(Cy/Var 2 (F Bl ,„ 2 ) -> 0. Therefore, we com- 
plete the proof of Lemma 5. □ 

With two sufficient conditions given in Lemma 4 and 5, we know that 
H nij n 2 — E(ff Wl , n2 ) q ^ 
Var(F ni)n2 ) 

If we let £ ni ,n 2 = U ni ,2 + U ni>3 + U n2s2 + £4 2 ,3 - 2W mni)2 - 2W nim) 3 - 
2W nini ,4, then S ni , n2 = H nuH2 + e ni , n2 . From Var(e ni , n , 2 ) = o(cr 2 i ri2 ), 

Vai /f?W) = Vai(e rei , ra2 ) 

\ (J n 1 ,n2 J (T n 1 ,n 2 

Moreover, we know E(e ni , n2 ) = 0. Therefore, s niim / '(T nitn2 A 0. From Slut- 
sky's Theorem, we complete the proof of Theorem 3. 
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6.5. Proof of Theorem 4- Applying the trace inequality, we know that 
tr 2 (T,h,i2^'h 12) ^ tr ( S h 11) tr ( s I 22)- Therefore, to prove Theorem 4, we first 
consider the case where tr^E/j^S^ 12 ) = 0{tr(X^ n )tr(E^ 22)}- From The- 
orem 2, we can show that A { nl I 'tr( S h 11) 4 1 and i4$/tr(Ej[ 22 ) 4 1. More- 
over, from (6.3), there exists a constant di such that 



Var{cSn 2 /tr(£i,n£ 2 ,«)} < di ( - + - 

\ri\ ri2 

which with E(C^ 1 ) n2 ) = tr^i^E^) implies that Q^/t^Ei^E^jj) 4 1. 
Similarly, using tr 2 (E/ l] i2E / /l 12 ) = 0{tr(E 2 n ) tr(E^ 22)}' we can ^ n< ^ a con " 
stant d,2 such that 

Var{t/ n/i /tr(E M2 E' M2 )} 

<^-{l + tr(E 2 ill )tr(E 2 !22 )/tr 2 (E ft)12 E^ 12 )} 
-)-0, 

which together with E(f7 n ,J = tr(E/ l!l2 E / /l 12 ) shows that tr(E/ 1) i 2 Ej l 12 ) — 
1 for h = 1 or 2. Hence, if we define 



1 1 



2^— + — \ tr 2 (Ei 2 E' 12 ) and 



^ 2 

w 0,ni,n 2 ,l 



2 1 4 

w 0,ni,n 2 ,2 = 2 J] — tr ( s2 ,ll) tr ( S i,22) + tr(Ei,nE 2) ii) tr(Ei 22 E 2j22 ) 

then under idgfe : 12 = E 2 12 = E12 and from the mapping theorem, 

W 0,ni,7i2 W 0,ni,ri2,l 

2(C/ ni /ni + [/ n2 /n 2 ) 2 
22 2 

W 0,ni,n 2 W 0,ni,n 2 W 0,ni,ra 2 ,l 



(6.8) + 



<n u n 2 ,2 ELi{(2M 2 )XVX?} + (4/(n 1 n 2 ))d 1 1 l 2 C ) (2) 



HI" 2 



,,2 , ,2 

w 0,rn,n 2 w 0,ni,n 2 ,2 



4i. 



Next, we consider tr 2 (E/ l] i2E' /l 12 ) = o{tr(E 2 ! 1;L ) tr(E 2 t 22)}- If we define 



^0,ni,n 2 ,2 = ^T^V^ni I + rr^nil^iil 

i=l * ^ 
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then, for a given constant e, we have 



00 



0,ni,ri2 



0,ni,ri2 



V w 0,ni,n 2 
-2 



W 0,ni,n 2 ,2 



00, 



0,ni,ri2 



>s/2 . 



Thus, we only need to show wg |)11>n2il /wg |niina 4 and £o, m ,r* 2 ,2/ w o,ni 

1, respectively. First of all, we know k>o,7ii,n 2 ,2 ,ni,n2 
ond, there exists a constant such that 

-2 , 2 



1 from (6.8). Sec- 



P - 



0,ni, 712,1 



0,ni,n 2 



>|)<4 



Etitr 2 (S M2 S^ 12 ) 
Eiitr(S? 11 )tr(Sf„ 2 ) 



i=i 



Si nw + 



tr 2 (Z^i2E- 12 ) 
mtr(E2 )tr(S2 ) 



which converges to zero under tr^E^E^ 12 ) = o{tr(E? 11 ) tr(E? 22 )}. There- 
fore, we have ooq Ui n Joo\ n n2 A- 1, as claimed by Theorem 4. 
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